A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

نویسندگان

  • Iman Sahraei Dehmajnoonie Science and Research Branch, Islamic Azad University, kerman, Iran
  • Keivan Borna Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, IRAN
  • vahid Hajihashemi Student Member, IEEE
  • Zeinab Hassani Department of computer science, Kosar University of Bojnourd, Iran.
چکیده مقاله:

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large number of features to attend as they play an essential role in detection efficiency. In this article, we're working on a feature selection method to e-mail spam. This approach is considered a hybrid of optimization algorithms and classifiers in machine learning. Binary Whale Optimization (BWO) and Binary Grey Wolf Optimization (BGWO) algorithms are used for feature selection and K-Nearest Neighbor (KNN) and Fuzzy K-Nearest Neighbor (FKNN) algorithms are applied as the classifiers in this research. The proposed method is tested on the "SPAMBASE" datasets from UCI Machine learning Repesotries and the experimental results revealed the highest accuracy of 97.61% on this dataset. The obtained results indicateed that the proposed method is suitable and capable to provide excellent performance in comparison with other methods.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification

Spam is commonly defined as unwanted e-mails and it became a global threat against e-mail users. Although, Support Vector Machine (SVM) has been commonly used in e-mail spam classification, yet the problem of high data dimensionality of the feature space due to the massive number of e-mail dataset and features still exist. To improve the limitation of SVM, reduce the computational complexity (e...

متن کامل

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

H-BwoaSvm: A Hybrid Model for Classification and Feature Selection of Mammography Screening Behavior Data

Breast cancer is one of the most common cancer in the world. Early detection of cancers cause significantly reduce in morbidity rate and treatment costs. Mammography is a known effective diagnosis method of breast cancer. A way for mammography screening behavior identification is women's awareness evaluation for participating in mammography screening programs. Todays, intelligence systems could...

متن کامل

A Novel One Sided Feature Selection Method for Imbalanced Text Classification

The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 31  شماره 2

صفحات  165- 173

تاریخ انتشار 2020-04-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023